Multilingual Aspects of Multiword Lexical Units
نویسندگان
چکیده
As most of the machine-readable dictionaries contain clearly insufficient information about multiword lexical units, there is a constant need to extend and tune specialized lexical databases to account for new expressions. In this paper, we present a system exclusively based on statistics that massively extracts from unrestricted text corpora contiguous and noncontiguous rigid multiword lexical units. For that purpose, a new association measure called the Mutual Expectation is conjugated with a new acquisition process based on an algorithm of local maxima. The system has been applied to a Portuguese, French, English and Italian parallel corpus and has evidenced that multiword lexical units embody a great deal of cross-language regularities.
منابع مشابه
On multiword lexical units and their role in maritime dictionaries
Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...
متن کاملMining Multiword Terms from Wikipedia
The collection of the specialized vocabulary of a particular domain (terminology) is an important initial step of creating formalized domain knowledge representations (ontologies). Terminology Extraction (TE) aims at automating this process by collecting the relevant domain vocabulary from existing lexical resources or collections of domain texts. In this chapter, the authors address the extrac...
متن کاملPazienza University of Roma Tor Vergata , Italy Armando Stellato University of Roma Tor Vergata , Italy Semi - Automatic Ontology Development : Processes and Resources
The collection of the specialized vocabulary of a particular domain (terminology) is an important initial step of creating formalized domain knowledge representations (ontologies). Terminology Extraction (TE) aims at automating this process by collecting the relevant domain vocabulary from existing lexical resources or collections of domain texts. In this chapter, the authors address the extrac...
متن کاملUsing LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units
The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, ...
متن کاملMULTILINGUAL MULTIWORD EXPRESSIONS Literature Survey
Multiword Expressions are idiosyncratic word usages of a language which often have noncompositional meaning. The knowledge of multiword expressions is necessary for many NLP tasks like, machine translation, natural language generation, named entity recognition, sentiment analysis etc. In order for other NLP applications to benefit from the knowledge of multiword expressions, they need to be ide...
متن کامل